Dataset statistics
| Number of variables | 4 |
|---|---|
| Number of observations | 2441666 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 392002 |
| Duplicate rows (%) | 16.1% |
| Total size in memory | 74.5 MiB |
| Average record size in memory | 32.0 B |
Variable types
| NUM | 4 |
|---|
Reproduction
| Analysis started | 2020-03-29 20:03:48.959259 |
|---|---|
| Analysis finished | 2020-03-29 23:45:19.645671 |
| Version | pandas-profiling v2.5.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 392002 (16.1%) duplicate rows | Duplicates |
FRECUENCY is highly skewed (γ1 = 29.65483338) | Skewed |
MONETARY is highly skewed (γ1 = 530.5060113) | Skewed |
AVGMONETARY is highly skewed (γ1 = 368.8070561) | Skewed |
RECENCY
Real number (ℝ≥0)
| Distinct count | 1845 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 511.2430815680769 |
|---|---|
| Minimum | 3 |
| Maximum | 1847 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 18.6 MiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 13 |
| Q1 | 87 |
| median | 323 |
| Q3 | 833 |
| 95-th percentile | 1537 |
| Maximum | 1847 |
| Range | 1844 |
| Interquartile range (IQR) | 746 |
Descriptive statistics
| Standard deviation | 499.7476266 |
|---|---|
| Coefficient of variation (CV) | 0.9775146982 |
| Kurtosis | -0.302109578 |
| Mean | 511.2430816 |
| Median Absolute Deviation (MAD) | 420.2925659 |
| Skewness | 0.9184276292 |
| Sum | 1248284850 |
| Variance | 249747.6903 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 3. 3.5 4.5 5.5 6.5 ... 1837.5 1841.5 1843.5 1844.5 1847. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 5 | 18389 | 0.8% | |
| 7 | 18283 | 0.7% | |
| 6 | 16820 | 0.7% | |
| 16 | 15659 | 0.6% | |
| 14 | 15141 | 0.6% | |
| 33 | 14668 | 0.6% | |
| 8 | 13846 | 0.6% | |
| 20 | 13165 | 0.5% | |
| 19 | 13048 | 0.5% | |
| 9 | 12977 | 0.5% | |
| Other values (1835) | 2289670 | 93.8% |
| Value | Count | Frequency (%) | |
| 3 | 420 | < 0.1% | |
| 4 | 2298 | 0.1% | |
| 5 | 18389 | 0.8% | |
| 6 | 16820 | 0.7% | |
| 7 | 18283 | 0.7% |
| Value | Count | Frequency (%) | |
| 1847 | 147 | < 0.1% | |
| 1846 | 248 | < 0.1% | |
| 1845 | 277 | < 0.1% | |
| 1844 | 187 | < 0.1% | |
| 1843 | 467 | < 0.1% |
| Distinct count | 911 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.534448200531932 |
|---|---|
| Minimum | 1 |
| Maximum | 6823 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 18.6 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 3 |
| Q3 | 8 |
| 95-th percentile | 39 |
| Maximum | 6823 |
| Range | 6822 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 27.01969983 |
|---|---|
| Coefficient of variation (CV) | 2.833902839 |
| Kurtosis | 3425.861139 |
| Mean | 9.534448201 |
| Median Absolute Deviation (MAD) | 10.72564128 |
| Skewness | 29.65483338 |
| Sum | 23279938 |
| Variance | 730.0641788 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000e+00 1.500e+00 2.500e+00 3.500e+00 4.500e+00 ... 8.530e+02 1.105e+03 1.772e+03 3.577e+03 6.823e+03], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1 | 857451 | 35.1% | |
| 2 | 348301 | 14.3% | |
| 3 | 203692 | 8.3% | |
| 4 | 140380 | 5.7% | |
| 5 | 104092 | 4.3% | |
| 6 | 82318 | 3.4% | |
| 7 | 67081 | 2.7% | |
| 8 | 55950 | 2.3% | |
| 9 | 47404 | 1.9% | |
| 10 | 40677 | 1.7% | |
| Other values (901) | 494320 | 20.2% |
| Value | Count | Frequency (%) | |
| 1 | 857451 | 35.1% | |
| 2 | 348301 | 14.3% | |
| 3 | 203692 | 8.3% | |
| 4 | 140380 | 5.7% | |
| 5 | 104092 | 4.3% |
| Value | Count | Frequency (%) | |
| 6823 | 1 | < 0.1% | |
| 4563 | 1 | < 0.1% | |
| 4309 | 1 | < 0.1% | |
| 3600 | 1 | < 0.1% | |
| 3554 | 1 | < 0.1% |
| Distinct count | 469481 |
|---|---|
| Unique (%) | 19.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 37852.1625447625 |
|---|---|
| Minimum | 0.0 |
| Maximum | 3971862682.06 |
| Zeros | 151 |
| Zeros (%) | < 0.1% |
| Memory size | 18.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 20.65 |
| median | 126.08 |
| Q3 | 696.9875 |
| 95-th percentile | 17200 |
| Maximum | 3971862682 |
| Range | 3971862682 |
| Interquartile range (IQR) | 676.3375 |
Descriptive statistics
| Standard deviation | 4231783.144 |
|---|---|
| Coefficient of variation (CV) | 111.7976586 |
| Kurtosis | 395786.1512 |
| Mean | 37852.16254 |
| Median Absolute Deviation (MAD) | 70736.5076 |
| Skewness | 530.5060113 |
| Sum | 9.242233831e+10 |
| Variance | 1.790798857e+13 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 5.00000000e-03 1.50000000e-02 2.50000000e-02 3.50000000e-02 ... 4.21185316e+07 8.18541412e+07 1.83170762e+08 6.11292195e+08 3.97186268e+09], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1 | 56019 | 2.3% | |
| 100 | 35227 | 1.4% | |
| 200 | 33104 | 1.4% | |
| 20 | 28916 | 1.2% | |
| 40 | 16833 | 0.7% | |
| 60 | 13966 | 0.6% | |
| 10 | 12243 | 0.5% | |
| 800 | 11602 | 0.5% | |
| 2 | 11393 | 0.5% | |
| 0.02 | 11186 | 0.5% | |
| Other values (469471) | 2211177 | 90.6% |
| Value | Count | Frequency (%) | |
| 0 | 151 | < 0.1% | |
| 0.01 | 7594 | 0.3% | |
| 0.02 | 11186 | 0.5% | |
| 0.03 | 1311 | 0.1% | |
| 0.04 | 1084 | < 0.1% |
| Value | Count | Frequency (%) | |
| 3971862682 | 1 | < 0.1% | |
| 2189348945 | 1 | < 0.1% | |
| 2181740452 | 1 | < 0.1% | |
| 1509163532 | 1 | < 0.1% | |
| 1207394588 | 1 | < 0.1% |
| Distinct count | 799770 |
|---|---|
| Unique (%) | 32.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1632.0347265716673 |
|---|---|
| Minimum | 0.0 |
| Maximum | 60613785.83 |
| Zeros | 151 |
| Zeros (%) | < 0.1% |
| Memory size | 18.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.92 |
| Q1 | 8.330866935 |
| median | 37 |
| Q3 | 157 |
| 95-th percentile | 2336.824607 |
| Maximum | 60613785.83 |
| Range | 60613785.83 |
| Interquartile range (IQR) | 148.6691331 |
Descriptive statistics
| Standard deviation | 86711.38196 |
|---|---|
| Coefficient of variation (CV) | 53.1308437 |
| Kurtosis | 188563.2826 |
| Mean | 1632.034727 |
| Median Absolute Deviation (MAD) | 2844.801142 |
| Skewness | 368.8070561 |
| Sum | 3984883703 |
| Variance | 7518863762 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 5.00000000e-03 1.01562500e-02 1.45000000e-02 1.55000000e-02 ... 1.10265705e+06 2.25993069e+06 5.00003692e+06 1.32426018e+07 6.06137858e+07], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1 | 62254 | 2.5% | |
| 100 | 36197 | 1.5% | |
| 200 | 33216 | 1.4% | |
| 20 | 29517 | 1.2% | |
| 9.99 | 19363 | 0.8% | |
| 40 | 16090 | 0.7% | |
| 11.99 | 15497 | 0.6% | |
| 60 | 14070 | 0.6% | |
| 10 | 12785 | 0.5% | |
| 800 | 11410 | 0.5% | |
| Other values (799760) | 2191267 | 89.7% |
| Value | Count | Frequency (%) | |
| 0 | 151 | < 0.1% | |
| 0.01 | 8084 | 0.3% | |
| 0.0103125 | 1 | < 0.1% | |
| 0.01057692308 | 1 | < 0.1% | |
| 0.011 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 60613785.83 | 1 | < 0.1% | |
| 50000000 | 1 | < 0.1% | |
| 35890966.31 | 1 | < 0.1% | |
| 33090909.09 | 1 | < 0.1% | |
| 30597459.46 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| RECENCY | FRECUENCY | MONETARY | AVGMONETARY | |
|---|---|---|---|---|
| 0 | 209 | 72 | 2,945.9 | 40.9 |
| 1 | 831 | 31 | 806,000.0 | 26,000.0 |
| 2 | 573 | 2 | 907.4 | 453.7 |
| 3 | 226 | 2 | 500.4 | 250.2 |
| 4 | 1400 | 8 | 371,042.5 | 46,380.3 |
| 5 | 523 | 1 | 1,115.1 | 1,115.1 |
| 6 | 141 | 2 | 21.7 | 10.9 |
| 7 | 33 | 99 | 12,422,280.0 | 125,477.6 |
| 8 | 51 | 140 | 8,055,781.5 | 57,541.3 |
| 9 | 833 | 8 | 465.7 | 58.2 |
Last rows
| RECENCY | FRECUENCY | MONETARY | AVGMONETARY | |
|---|---|---|---|---|
| 2441656 | 1818 | 1 | 1,760.2 | 1,760.2 |
| 2441657 | 1826 | 1 | 3,561.8 | 3,561.8 |
| 2441658 | 1820 | 1 | 6,701.6 | 6,701.6 |
| 2441659 | 1829 | 1 | 8,789.6 | 8,789.6 |
| 2441660 | 1841 | 2 | 1,411.0 | 705.5 |
| 2441661 | 1827 | 1 | 206.0 | 206.0 |
| 2441662 | 1841 | 1 | 1,081.0 | 1,081.0 |
| 2441663 | 1819 | 1 | 11,995.0 | 11,995.0 |
| 2441664 | 1822 | 2 | 9,875.9 | 4,938.0 |
| 2441665 | 1836 | 2 | 50,937.5 | 25,468.7 |